inequality problem
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
Zhao, Haoyu, Geng, Yihan, Tang, Shange, Lin, Yong, Lyu, Bohan, Lin, Hongzhou, Jin, Chi, Arora, Sanjeev
LLM-based formal proof assistants (e.g., in Lean) hold great promise for automating mathematical discovery. But beyond syntactic correctness, do these systems truly understand mathematical structure as humans do? We investigate this question in context of mathematical inequalities -- specifically the prover's ability to recognize that the given problem simplifies by applying a known inequality such as AM/GM. Specifically, we are interested in their ability to do this in a compositional setting where multiple inequalities must be applied as part of a solution. We introduce Ineq-Comp, a benchmark built from elementary inequalities through systematic transformations, including variable duplication, algebraic rewriting, and multi-step composition. Although these problems remain easy for humans, we find that most provers -- including Goedel, STP, and Kimina-7B -- struggle significantly. DeepSeek-Prover-V2-7B shows relative robustness, but still suffers a 20% performance drop (pass@32). Even for DeepSeek-Prover-V2-671B model, the gap between compositional variants and seed problems exists, implying that simply scaling up the model size alone does not fully solve the compositional weakness. Strikingly, performance remains poor for all models even when formal proofs of the constituent parts are provided in context, revealing that the source of weakness is indeed in compositional reasoning. Our results expose a persisting gap between the generalization behavior of current AI provers and human mathematical intuition. All data and evaluation code can be found at https://github.com/haoyuzhao123/LeanIneqComp.
- North America > United States (0.04)
- Europe > Germany > Berlin (0.04)
How to solve AI's inequality problem
His 2014 book, coauthored with Andrew McAfee, is called The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. But he says the thinking of AI researchers has been too limited. "I talk to many researchers, and they say: 'Our job is to make a machine that is like a human.' It's a clear vision," he says. But, he adds, "it's also kind of a lazy, low bar.'"